DOCOHEVT BESaHF 



ED. 1B6 471 

, AUTHOR 
TITLE 

INSTITUriON 

SPONS AGSNCy 

pas DATE 
CONTPACr 
NOTE 

FDPS PRICE ' 
D2SCRI PrORS 



IDENTIFIER 



TH 800 176 

Patience, Wavne h.: Peckaj5e, Mark D. 
Operational Characteristics of a One-Parameter 
Tailored Te«t<r.q Procedure. Pe'search Eeport 79-2. 
Missouri nniv.» Columtia. Tailored Testing Research 
Lab. 

Office of »Javal Posearch, Arlington, Va. Personnel 
,and Trainina P€?€arch Prcqrains Office. 
Oc* ■"'^ 

N0001 -c-009"': NP- 150- 39=i 
39p. 

MF01/PC02 Plu? Postaae. 

*Con)put[er Assisted Teptina: *Difficuity Level; Erro: 
of Measuremer'.* : Item Analvsip; *lteni Eanks;> *Latent 
Trait Theorv: Mathematical Models: '^'Siirulation; 
Statistical Bia?: Test Construction; ♦Test Items; 
Test Peliabili+v 

FORTPAN ProqramiTia lanquage; Maximum Likelihood 
Estimation: Mor.te Carlo Methods: Rasch Model; 
* Tailored Testinq 



ABSTPACT 

?n experitner* was performed with cc nputer-genera ted 
data to investigate some cf the operational characteristics of 
tailDrei testing as they are related to various previsions cf the 
computer program and item pool. Wi*h respect to the computer program, 
two character! sties were v-ried: *he size of the step cf increase or 
decrease in item diff:cul*v for successive items, and the range in 
difficulty levels within which items might be considered acceptably 
close tD a specified level, with resD€ct to item pocls, the two 
characteristics were varied: *he number of items in the pool, and the 
shape of the i+em difficulty distribution. Simulated test data were 
generated by computer for various values of the .four parameters (step 
size, accep*rarce ranqe, rumber of items, afid item difficulty 
distribution) and for various hvpo+hetical ability levels from plus 
three to minus three. The re=:ultina expected values and standard 
errors were tabulated and are presented as a quide for these involved 
In Fet*inc7 up tailored ^e^^irq crcceiures. (.a.uthor/CT M) 



* '"^fproduc* lor. ^uppl: orl bv FP^? h re the best that can bv. made * 

* ^r'^nfi +-Vr ^rioi??, 1 <! ecu rpen * 

ft***tt*«**c***A*»*** ******* *************«****»***a**st:i:»a^**5((j^ 



o 

ERIC 



Operational Characteristics of a One-Parameter 

Tailored Testing Procedure 



OQ Wayne M. Patience 

1 — I and 

Mark D. Reckase 



Re-'earch Report 79-2 
October 1979 



Tailored Testing Research Laboratory 
Educational Psychology Department 
University of Missouri 
Columbia MO 65211 




rH 

prepared under contract No. N00014-77-C-0097, NR1 50-395 
with the Personnel and Training Research Programs 
Psychological Sciences Division 
Office of Naval Research 

Approved for public release; distribution unlimited 
Reproduction in whole Of in part is permitteo for 
any purpose of the Unite:i States C,overnment 



/ 



^CCuMT v CL ASSiriCATtOS OF THit PAGE (Wftmn Omt m Enffd) 



. REPORT DOCUMENTAT ION PAGE 



79-2 



4 TlTLf (*rid Subtlllt) 



Operational Characteristics of a One-Parameter 
Tailored Testing Procedure 



7 AuTrtOIr*; 



Wayne M. Patience and Mark D. Reckase 



• PEHroHMlNO OBOANiZAT'ON HAME AND ADDRESS 

Department of Educational Psychology 
University of Missouri 
Columbia, Missouri 65201 



1 1 contboll'Su o» pice name and address 



Personnel and Training Research Programs 
Office of Naval Reseal^ch 
Arlington. Virginia 2221 7 



TT MONITORING AGENCY NAME A ADDHESSil/ <««»'»n( Irom Conttoltlng OlUc*) 



REAO INSTRUCTIONS 

BEFORi Completing form 



5 TYPE OF Rf PORT » PERIOD COVERED 

Technica Report 



6 -'f RFORMIN . ORO REPORT NUMBER 



t. CONTRACT )R ORANT NgMBERf*; 



N00014-7 -C-0097 



10. PROORAM 'iUEMENT. PROJECT. TASK 
AREA » WORK UNIT NUMBERS 

P.E.. 6^153N Proj.: RR042 
T.A.: 0 2-04-01 04 
W.V.: N '.150-395 . 



12. REPORT O ^TE 

Octobe - 1979 



13 NUMBER O"* PACES 



15 SECURITY TLASS. fo/ f»porfJ 



Unclassi "ied 



15«, OECt ASSi PICATION/OOWNGRAOINO" 
SCHEDUL 



16 Ol S't HI 9U TiON STATtMCNT ti.1 Ihit «»pof(i 



Approval for public release; distribution unlimited. Reproluction in whole 
or in part is permitted for any purpose of the United States Government. 



i» niSTRiBU^iON STATEMENT lot lh» •6*tr«r» mffJ In Block 20, ,1 <flH»iT • horn Rmpott) 



le SuPPL eMENTA«> NQTfi 



'9 



lEv WORDS Conilnu* on iiiv»r»» •Id* il n»(-«»««o tid IdmnlUy by 6/oc* riim6»r> 



Tcstiny 

Ability Testing 
Latent Trai t Model :> 



Rasch Model 
Tailored Testing 
Computerized Testing 



.Vi~~»e'T.:»AC ■' •i'.in»inu» nn~i»v»f II n»c*»»«r>- ld»niity by block numb»;) 

While numerous articles have appeared in the literature which describe 
thv (>nt?-parametor logistic model -and its applicatio-i in a tailored testing 
setting, littlp or no research has been conducted on the of-orational 
characteristic, of the pfoc(?dure when p'^ogram pardin»'ters and item pool 
attributes are varied. The primary objective o^ this investigation was 
. to determine the effects of varying the program parameters, stepsize and 
acceptance range, as well as the item pool attribut.-s, size and shape, on 



DD 



AN 



'^^ 1473 fV. '.ONO^ » NCv ft^ 'S. OBSOL t T't 



Stl. u'R'Tv r L ASSi ^ Tc a T .^.h O • TmiS PAG€ 'Wh»n D»f Bnffd' 



ERIC 



9 



pro- 
i ten 



the bias and standard e rror of the maximum Hk'elthObd ability estimates 
obtained from tailored tests. Specifically, two main research questions 
were addressed. First, what values of stepsize and acceptance range 
provided the least bias and smallest standard error of ability estimates? 
The stepsize program parameter corrtrol.led the magnitude of movement through 
th«- item pool during the initial item selection phase of tailored testing. 
T;.e acceptance range program parameter specified how deviant the selected 
item's difficulty value could be from the requested item difficulty and 
still be- chosen for administration. Secondly, what shape and size of 
item difficulty distribution provided the least bias and standard error 
of ability estimates across the range of the latent trait? Two FORTRAN 
grams were used for investigating the effects of program parameters and 
pool attributes. Both programs took as input the stepsize, acceptance 
range, item difficulty values for the various sizes and shapes of item 
pools, and the trqe abilities for which estimates were to be made. The 
first program, TRtElP, produced the propensity distribution, the probability 
distribution for observed ability estim ates given, a true ability, o, and 
provided output of the E(o) and /VAR(o) . The other program, SIMIP, was 
developed to overcome the limitation on the size of item pool which could 
be investigated at a reasonable_cost using the TREEIP program. The SIMIP 
program provided output of the and S.D.(.j) of ability estimates of 
a specified number of simulated tailored tests assuming a given o. The 
results of the study were drawn from tables which summarized the output 
•'of the TREEIP and SIMIP programs. In addition to the reconnendations 
regarding the research questions stated above, an effort was made to discuss 
the interaction of the variables of stepsize, acceptance range, item pool 
size and the shape of the^ distribution of item pool difficulties. Results 
suggested that each of these variables played a substantial role in affect- 
ing the magnitude of statistical bias and standard error at various points 
along, the ability continuum. The results were presented as a guide for 
those involved in setting up a tailored testing procedure. The intent 
was to provide figures and tables to facilitate applications of tailored 
testing procedures such that a minimum of bias and standard error of ability 
estimates could be attained. 



t . . » •.••• I . < •. I • I, I' » . I • »T.. 



ERIC 



— ^ — , — ^ — ; " — w. — ^T i TT . . — - — — , — ■ ■ ■ • " " . "^ . ^r . — 1™ i'*!r . 7^^^r!fvr i. tZ' fa' S t v^^!^^ 

CONTENTS 

Introduction 1 

Purpose 2 

Progrdiiis ^ 

ResearcM Oesi-jn g 

i'.esults \ g 

Item Pool Shape 9 

Item Pool Size .*!!*.'.!.*!!!.*.* 12 

Steps ize ^2 

Acceptance Range '. *..*!.**..*! 18 

Secondary Results ........... 21 

Discussion 23 

Iten; Pool Snape 23 

Item Pool Size 2A 

Steps ize ............. 26 

Acceptance Range 27 

TREEIP 27 

SIMIP 28 

Summary and Conclusions .^0 

References 31 

Appendix A 32 



.0- 



ERIC 



PeHATrOfJAL CHARACTl-RlSTiqS OF A OnE-PaRATCTER 
JaILORED WlNGlPtoCEDURE 



1 

t.nT.Mn^\\i1pr;7-IlriJ ^f' I'^^'^l'''^ ^cdring of test items adminis- 
n, .r", ^ n. int.f> active fashion to individual examinees, has within the 
0 r tne spearhead for application of latent tra] mode s 

to OLhiLweui-nt and ability measurement. The avai labil ity of improved 
computer technology has contributed greatly ^o the increasfirthe number 

te^ '''"u 'V/:t'''' ''''' administer tanored adapt ve 

te.t.. It should be noted that tailored testting as presented here is 
synonoHous .-.ith nany other assigned names suci as adapt ve testing 
/ response continapnt toct-inn r,,^..^^*,- , -i ...J^.. _ i::^. '-^^'-ing, 



,• ... ^ --w.. uuupi^ivc LCiLiny, 



or iisso.jri IS based on the one-parameter logistic model. 

1 

.;.iiie nuinerous articles have appeared in the literature which dp<;rrihp 
e one-,a.ametor logistic model and its application in a tlilored ?e t- 
ng settiny (see, for example, Reckase, 1974; W^iss, 1974; Patience 1977) 

t of ianor'r.'^^''^" discussing Vra^-ona^ chl^a L^^^ 

POO attrib ! ^ procedure when pr^ogram parameters and item 

refer to hnw !!n Th TM • '"P°'*' oplerational characteristics 

2uifl Z ^ ^^"^ tailored testing procedure estimates a given true 

progran; par.,metcr. will be described in detail shortly. ' 

t 

Altn.nrjh nr literature was found which addressed the effects of varv- 

Kh";:i:Ps^^-rr:T'rf' t n^^^"" ^^^^ ^^^^^^^^ theilte^ tule ' 
f.n7 '"'y I attributes on tne operation 

r tailorod routing. Jnnse,,a (1975 . for example, has investigated the 
nl uence ol K.n pool size and item characterist cs on a Bayesian tailored 

N 6) on u:;^ w I'Tn"'''''''^ have very large item pools. Reckase 
fir .Hf 'f'' ^econiPending a rectangular distribution 

P ocXr w ; / ^^^"^ ^^"^^ ^t"^^' the tailored testing 

P_oce.,u,e was ..a on an empirical maxiiium likelihood estimation of the 
•lb. ity para,.,eter of the simple logistic (Rasch) model . Issues worthy 

urtivr- .nv.^stiqation have surfaced in addition to item pool attributes, 
.tt-cts ot pro^jrair. parameters on the bias and variance of 
H at) 'Ml 



i.-'i lit' 



ERIC 



■ .a • nln- / appe red in the literature which use the phrase 

. -a. . t. l.n,.., t-,tiny ability estimation" to mean 'uzedural bias 
t.Hvur.l >uL..jroui)', of an examinee population such as mim 



aies (see, for 



oxampUs Pine and Weis<-., 1978).! The research reported here did not add- 
ress this type of bias. Rather, ability estimate bias, as investigated' 
by this paper was concerne.l , with whether the expected values of the maxi- 
mum likel i hood ability estip>ates were equal to the known true ability 
In this sense, the attempt was to identify values for the program para- 
meters and the item, pool characteristics which would provide the least 
statistical .Dias in ability estimation. The variance of ability estimates 
was tne squared standard error of the ability estimates for a known true 
abi.Uy. The desire was to minimize this standard error. These two 
dependent measures provided the criteria for judging how well the tailored 
testing procedure osti^..dte.^ known abilities when the program parameters 
and iter; pool characteristics were var.ied. 



' Purpose 

The primary purpose of the research described herein was to deter- 
mine the operational characteristics of a one-parameter tailored testing 
procedure when program parameters and item pool attributes were varied 
The program parameters investigated were the stepsize and acceptance 
range. The stepsize parameter specified the magnitude of movement of the 
ability estimate during the initial item selection phase of tailored 
testing. After the initial pnase, maximum likelihood ability estimation 
was used. Tne acceptance range parameter determined how deviant the 
se ected item's difficulty value could be from the requested item diffi- 
culty and sti.n be acceptable for administration. In the tailor'^d test 
Items were requested by the procedure to match the ability estimate computed 
based on previous item responses. The item pool attributes varied were 
siz^, shape, and quality. Each of these variables will now be described 
■•!ore specifically. 



I 

1 »• • 




lti£, profise of tailored testing is that when an examinee answers 
correctly, the next item administered should be more difficult 
.in.-; .vn-r ar exannoPe answers -in item incorrectly, the next item should' 
leis di^ricuU. The stepsize program parameter initially controlled 
ncv. rucn ro-'e difficult or easy was the next item administered. The 
be1ec,tion ot Uoms was controlled by the fixed stepsize until the examinee 
nau answered itens both correctly and incorrectly. After both a correct 
and incorrect response nad been obtained in the response string, a maxi- 
mum likelihood ability estimate was obtained using an iterative search for 
the node of the likelihood distribution. For a more complete description 
0^ the item st>1ection and ability estimation components of this maximum 
1'kelihood tailored testing procedure see Patience (1977). In the past, 
jr-hit'Mry v.ilues have generally been chosen for the stepsize. One of 
tne prii;ia>y goals of this research was to empirically investigate the 
c?*tocts of stepsize v,iluf?s on the bias and standard error of ability esti- 
i-ates. In so doinq, the intent was to determine the optimal stepsize 
valur whic'i would ininii.n/c the bias and standard error of ability estimates 

T"h.' second |)r-o(jrdiii pa ram.} ter investigated was the acceptance range, 
'he jccept.ince raruje specified t'ie amount of deviation in difficulty an 
ar ini^ter-ed item could havf? from the requested item difficulty and still 
v» iv.ceptat)h.' tor administration. The acceptance ranye parameter monitored 
•-he dpur-opriatenf.'ss of i.tems -.elected thr'uuqhout the tai lorded- test , i.e., 

7 



both dun ng Uen; sHection based on the fixed stepsize until both correct 
and incorrect responses had been obtained, and also during item se^ec- 
;.^?yrn?. if'""' the information function for a maximum likelihood ability 
raJjp ?\h!V°-" i^-J l^^"} ''^^^ or minus the acceptance 

rvuu'stVJ^i ^^'^^ ^ difficulty value nearest the 

thp cnl . . ' "° ^t^'" ^^^^^ available from the pool within 

est T:;. nn'^T'"?[; °^ difficulty requested, the tailored 

L.l fl A 'I'^^t^^^ The prihary aim regarding the acceptance range, tnen, 
Ti ,^^,^f ^7'^"^ ^hat va ue or range of values yielded the least bias ' 
and standard error of ability estimates. Clearly, a small value for the 
acceptance range would h^ye insured that items very near the desired item 
t ce'r n'LT administered. On the other hand, too small an accep' 
ofX ?ai?nrpH iVT^"* '^'i;' '""ZT^"^ '^'"^^ °^ premature termination 
nltP h 1 .^k'^' ""i^^ ^^^^ '"^^"^ bi^s 0^ the ability esti- 

?^h"itpn nn^i .^^-K^i^"* ^^^^ '^^P'^"^^ acceptance ranqe interact 
oD^Lf ni/.^^'^^;i^" therefore, a choice of what values are 
optii..al ii.ay not be made assuming independence of these controlling factors. 

and JuTulT' .T.Mi'J^!;'^'^^' '^i"^^'^^ ^"^ "^^^^^^h included size, shape, 
and quality. Siuulated item pools used in this investigation ranged in 

recunauia"r^"bi,'nd^f 'Tc ''''' distr?butiSnsTre nonnal , 

rtctangu ar, bii„oda , and skewed. Item pool quality referred to the con- 

diffir, i/''''^ '^"'^^"f?^ ^^^^1^"^^^ P^'°ls consisted o? 

Item difficulty parameters equally spaced from -3 to +3. 

P.rh iTlfl T^' ""sisted of item difficulty values (minus one times 
Ldch of the iten easiness values) obtained from calibration runs using 
nod.r''^?n 3k?s3n (1969) calibration program based on the Rasch , 

.=.odel. In these pools, items were not equally spaced on the difficulty 
'n^ lAn rl ^^^"^ contained 72 items while the other 

esti ,a?e frnn fl " ^2 Uen, pool consisted of item difficulty parameter 
Ube ed vnPi Thp" °^ vocabulary tests. This pool was 

labeled WIPL. The other pool was constructed using item diffirulty para- 
neter estii.ates fron the calibrations of tests covering the evaluation 
tecmiques portion of an introductory measurement and evaluation course. 

•( Pi'^Jn rnV '''^'^^ ^'^l^' "^'^ distributions of item difficulty for 

. '^^^ were graphed and appear in Appendix A. It should be noted 

t'd. itui,. jou1 attributes played a substantial role in the utility of 

tih> t.jiior-t-i -OS ting procedure. 



Proqrafi|s 



. wo 



■UrJ-An proyrams were used for investigating effects of program 
> ind ite-;i pool attributes. The input variables for both pro- 
incl,Kj(H.: d) acceptance range, b) stepsize, c) item pool size, 
en dirficulty values for the various sizes and shapes of item pools 



• ii"-> 

e) thv tnn: abilities for a set of hypothetical examinees . "coth' pro - 
.'r..i, s output th(- nf?an dm', standard deviation of the estimates of each 

>tn ,ty provided. These served as dependent measures for determina- 
. on u. -.o .)ua1,t.y of ostii-iation for the specific values of the acceptance 
■i^'.-", • .i^c, and it(Mi [)ool parar.eter set. 



ERIC 



Trie first urograi;i, tin.- TRELIP, was based on the concept of a propensity 
distribution.* A propensity distribution in this context was defined as 
the probability distribution for observed ability estimates given a true 
ability, P(u|o) (lord and Novick, 1968). The concept of a propensity 
distribution was extended from its use in true score theory to the context 
of latent trdit ability estimation. The TREEIP program determined the 
propensity ui Uribut ion for a given true ability, u, analytically from the 
properties of the tailored testing model. 

oriefl>, tne TREL1P program operated as follows. Initially an item 
of average difficulty was administered to the simulated examinee with 
knoi-vn true ability. Based on the probability function for the simple 
logistic iiodel , 

u(u - b) 



where .. is the item score (0 or 1), b is the item difficulty parameter, 
and -is the ability parameter, the probability of a correct and the pro- 
bability of an incorrect response were obtained. If the response were 
correct, tne anility estimate was increased by the stepsize. If the response 
were incorrect, the ability estimate was decreased by the stepsize. Thus 
after onejtei- was administered, two paths or branches were present on 
the "tree". (Tne tree diagram from probability theory was employed to 
represent the propensity distribution in this study.) Based on these 
first possible ability estimates, the closest items to each of the two 
estinates was selected for administration with the constraint that the 
difficulty of tne items must have been within plus or minus the acceptance 
range from the present ability estimates. If no items were available, 
that branch was terminated at that point. However, assuming items were 
available, there existed four possible paths after the second item had 
been administered. As'-long as all correct or all incorrect responses 
.vere obtained on a given patn, trie ability estimates continued to be increased 

decvea^ed, respectively, by the stepsize. However, when both a correct 
jnd an incorrect response were present on a particular path of the tree, 
a max i:Mum- 1 i kel i hood ability estiiiiation procedure obtained an ability esti- 
mate using an iterative search for the mode of the likelihood distribution. 

To partially illustrate linw the propensity distribution was determined 
.■-)•, th.^ TRELIP, Fiqure 1 shows a diagram representing the operation of the 
rrocedure on a nine item rectangular pool. The stepsize used for this 
1 1 lustration was 1.0 and the acceptance range was 0.3. The ') for this 
analytical derivation of tne propensity distribution was set at zero. 
As .,as pointed out above, the procedure began by administering an item 
of average difficulty frnn; the pool, i.e., the item with the difficulty 
p.uai-eter 0.0. The [)rohability of a correct response, as deteririned by 
*ne irot)al)i 1 i t.y function given atiove for the simple logistic model, was 
'■".S an.j thf.' (uotjah i 1 i t.y of an incorrect response was 0.5. 

Att.rt- a cniriH.t r('S()orv,p thp ability estimate was increased by the 
t". •.!.•> , It df^'M 1 nrotr ccL response, it was decreased by the step- 
itt'-r onc Iter, Ihn ibility estii-ate was either I.O with 



ERIC 



-5- 



Figure 1 

Procedural Operation of TREEIP 
on a Nine Item Pool with 
Stop<;i2(? = 1.0 and Acceptance Range = 0.3 

Protuihi I itv istiiiidto Probability Lstiinate 

-"t l-Jespon^e ( I teiu 5i?lected) of Response (Item Selected) 



ParanietiM', 



3.00 
2.25 
1 .50 
0.75 
0.00 
75 




-1 .50 

-2.:5 

-3.00 



s. 



1.00 
(0.75) 



, -1.00 

' (-0.75) : 





0.0 
1.0 



2.00 
(2.25) 



0.375 

( * ) 
-0.375 

( * ) 



-2.00 
(-2.25) 



0.0 
1.174 



Note_. The * indicates that no item was available in the pool within + 
the acceptance range. ~ 



probamlit^ ot 0.5 or -1.0 with a probability of 0.5. This procedure was 
followed sn t'iat finite ability estimates would be available after each 
Item respon^^^ rather than the + «> value given by the maximum-likelihood 
procedure. The expected value of the distribution after one item was 
0.0 and thi? --tandard deviation was 1.0. 

' I'ieC on those first possible ability estimates the closest items 
were selected from the pool with the restriction that their difficulties 
must have been within plus or minus 0.3 of the requested difficulties. 
Thus, as Fiqure 1 illustrates, items with parameter estimates of plus* 
and minus 0.75 were administered to the estimated abilities plus and minus 
1 .00 respoctivoly. On the upper branch of the tree, a correct response 
vielded an aoility estimate that was again increased by the stepsize, 
^ince a maximum-likelihood estimate could not be determined without both 
a correct and incorrect response. Now, the ability estimate was 2.0 
.he probability of this correct response to the item with the 0.75 diffi- 
calty parai:pter was 0.32. The bottom branch of the tree was the sar.e 
•V'Cf>('t tor f-:' rhanqe in si;jn of the item parameters and ability 



o 

ERIC 



y\ . . .. • 



estimates. When tKe itein pool distribution being considered was symmetric, 
.trie results of the analyses were the same above the zero point as below 
the zero point except for the change in sign. 

Following tneuniddle branches of the tree, an incorrect response to 
the item with diffil^lty 0.75 yielded an ability estimate of 0.375 from 
the iiiaxii iui.i-1 ikel ihncTrf— technique. The probability of this response was 
0.68 based on the model. When the first item was missed and the second 
answered correctly, the probability of the second response was also 0.68. 
By the local independence assumption of the model, the probability of 
either a + 2.0 estimate was 0.5 X 0.32 = 0.16 while the probability of 
t 0.375 was 0.5 X 0.68 = 0.34. In this manner the propensity distribution 
could be obtained after two items had been administered. As noted at 
the bottom of Figure 1, the expected value was still 0.0 and the standard 
deviation (which was determined as the square root of the VARfe)) was 
1.174. 

Ttie tree developed further in this same manner whenever items within 
the acceptance range were available. If all correct or incorrect responses 
were present, the fixed stepsize was used to make ability estimates. 
Once a nixture of correct and incorrect responses was present, the maxi- 
mum-likelihood ability estimate procedure was used. Note the "branches" 
of Figure 1 were "live" at +2.00 ability estimate but no items existed 
in the pool within- +0.3 of the ability estimate + 0.375. Therefore, 
those branches termTnated. 



The tree continues to develop by following all "live" paths. The 
program is. f ini'-.hed after all branches are terminated by the condition 
tMat po itens of appropriate difficulty are available in the pool. One 
nay well in'agine that as the number of items in the pool gets larger, 
the procedure is, practically speaking, bounded by the storage capacity 
of the co!:puter facility and magnitude of one's computer budget. For 
tne loM 370/163 iysten: on which the TREEIP program was run, it was fotJnd - 
that sixty-one items was the practical upper limit on the number of iftems 
the pool could contain for any particular run of the various combinations 
of stepsize, acceptance range, and shape of the item difficulty distri- 
bution. 



Due to the limitation on size of the item pool which could be investi- 
gated .vit^i the TREEIP program, the second computer program, SIMIP, was 
developed. Ihis program was adapted from the tailored testing procedure 
^d-^e'i on the Rasch modfl which was already operational. This particular 
tailored testing procediire h^s been described thoroughly elsewhere (Reckase, 
19'M), so only the detail:, pertinent to this research have been presented. 
Tne SIMIP proyr-uii followed only one path for any given o in contrast to 
t':e T''EL1P. A particular path was selected using Monte Carlo simulation 
tec'inifjMt"'- . It i)fovidi'(l for investigation of the properties of bias and 
«,iriance of dhility estin.dtion with nuch larger item pools since the required 
.tord je jriil computation v/im m Mj[)Stdntial ly reduced ds compared to the 
:'%LE1P (.roqr-diM. 

7n»' to! lowing vjiues -erv^d v. input to the pi'ograi": the stepsize, 
accept. iric" -an')*', i Leni ikhiI d i 1 1 i cjI ty va^ue'"^, ■• , and number of simulated 



ERIC 



i 



ERIC 



tests tu De ddininistered by the tailored testing procedure. The proce- 
.diirt^ inuially administered an item of average difficulty from the pool 
of Items provided. It a correct response were obtained, the ability was 
increased by the stepsize. 'If an incorrect response were obtained, the 
abi ity wa-. decreased by the stepsize. The appropriate'item for the new 
auihty was ddnimstered. This fixed stepsize up and down proceduren:on- 
tinued until botn a correct and incorrect answer had been obtained in the 
respunso. string. Then the procedure switched from the fixed stepsize 
procedure to maximum-likelihood ability estimation. In both cases, items 
were selected to maximize the item information (Birnbaum, 1968). Ability 
estination was accomplished after each item was administered (provided 
correct and incorrect responses had previously occurred) by the maximum- 
likelihood estimation procedure using an iterative search for the mode 
or the likelihood distribution. The items administered had to be within 
plus or minus tne acceptance range from the requested item difficulty 
If no items were avai-lable within this range of the estimated ability! 
the procedure stopped. The only other stopping rule was based on a preset 
maximum number of items that was to be administered. 

Items were scored correct or incorrect by the SIMIP program utiliz- 
ing an..internal random num - generator. First, the probability of a 
correct response was computed using the formula for the probability func- 
tion of the simple logistic model stated earlier. The 9 for this computation 
was tne true ^ that was input into the program, and the difficulty para- 
meter-, D, .vas that of the item just administered to the simulated examinee. 
^>rter tnis probability of a correct response had been determined, the 
random number generator selected a number between zero and onp from a 
rectangular distribution. If this randomly selected number was less than 
or equal to the probability of a correct response, the item was scored 
correct. Ir the randomly selected number was greater than the probability 
of a correct response, the item was scored as incorrect. An ability esti- 
fiate was then obtained and th next item to be administered was selected 
to maximize information for this estimated ability. This procedure continued 
until on^> of tne stopping rules was encountered. 

The i-ajor controlling program parameters for both the TREEIP and 
^:'^^? were tr.e stepsize and acceptance range values. The stepsize para- 
:-eter controlled how quickly the procedure would move through the item 
jool vhik' the acceptance range parameter specified how discrepant '"tens 
ecu id i)p rr-o<.' those desired and still be administered. The accepta.ice 
fantjt' also indirectly determined the number of items from the pool which 
•ver..? dvaild!)le for administration. Clearly, the wider was the acceptance 
'•ango, t-- greater- was the number of items that could have been chosen 
tor admi ni str'dt ion . 

Ti'e TRtLir and SlMlP pronrams used in this study for determining 
t u' o;'tiii'al itefisize, acceptance range, item pool size, and item pool 

'-.Dutioii were similar in that both output the mean and standard devia- 
tion of dhility estiii'dted for each true •■ input. However, they differed 
m t'! • annul- in which the mean and standard deviation were determined. 

1 le the TRLLIP pursued all possible paths through the item pool, the 
blMjP tollowou' only the path that was the result of the simulated inter- 
iction of an examinee with the tailored testing procedure. Tne mean and 



1 > 



-8- 



deviation from the TREEIP wero actually' expected values and square 
vdridnco conjputod from probabilities arising from the one-parameter 
estimates arising 



standard 
roots of 

model and ability estimates arising from the maximum- likelihood estimation 
techniquq. The SIMIP program provided a mean and standard deviation of 
the set of ability estimates obtained for each. of the u'$ specified. 



Research Design 

To investigate the optional stepsize, acceptance range, item pool 
size, and itei; pool shape, nearly all possible c'ombi nations of the follow- 
ing wpre input into the TREEIP and SIMIP programs for true abilities -3, 
-2, -1, 0, 1, ?, and 3. The stepsize values used were .3, .4, .5, .6, 
.693, .8, .9, 1.0, 1.5, 2.0, and 3.0, while acceptance-ranges were .1, 
.2, .3, .4, and .5. Item pool sizes were 9, 13, 25, 31 , 61 , 72, 180, 
and 181. Item pool shapes investigated were normal, rectangular, bimodal, 
and si'.ewed, with difficulty values constrained between plus and minus 
three. Idealized item pools (difficulty values in the above shapes with 
spacing dependent on shape and size of item pool) were constructed and 
used as input to the programs, as well as actual item pools (test items 
calibrated and formed into pools with no constraint on the spacing along 
the di ff iculty scale) . 

Trie r;anner in which item pool size effects were investigated using 
simulations was to run the TREEIP and SIMIP programs on the various sized 
pools mentioned above. With the resulting data, plots and projections 
were made to estimate the item poo] sizes needed for various accuracies 
of ability estimation. The relationships between the item pool size, 
bias, and the standard deviation were determined. 

The comparisons to determine the optimal combination of independent 
variables were based upon the mean and standard deviation of twenty-five 
siruUted administrations of a tailored test to each 6 using the SIMIP; 
wp.ere for the TREEIP program, the comparisons were of the expected value 
of •, L'( ), and the standard deviation of o, /Var{o) . Values of these 
dependent variables wore compared across program runs using various sized 
item pools, holding stepsize and acceptance range constant. They were also 
co«iipurt;d from runs using various shapes of item pools, holding size of 
Item pool, stepsize, and acceptance range fixed. Additionally, compari- 
sons were i;;ade of the dependent variables, first varying stepsize with 
all otne'- variables fixed, and then varying the value of the acceptance 
r-tuiqe while holding all otfu^r variables constant. Since the TREEIP pro- 
iirdii' consiriorod to yield the most accurate values, i.e. E(0 and 
»Var~( ) hdsf?d uoon the pi'ot)ensity distribution, another comparison was 
:t'oi;ed iiiportant. Because, the SIMIP means and standard deviations were 
.ui.ject to sair.ple variation, tfiey were validated against values of the 
Ti'EMP fof var-ious runs on th^^ sixty-one i ten: pool. Also, the number 

est)i:iat(?s of t\\e truf> ability, i.e. the number of tailored tests admin- 
istrrcfj to oac'. sij'iulated i'xamin(?e by the SIMIP program, was varied. Tmis 
•va 5 done to chtu-.k wfiotht.-f an j()pi opriate nui!'l)Gr of administrations had 



ERIC 



Pesults 

n...' rvMjlLs of this study were to a groat extent drawn from tables 
whirl. ;......,Mr,.vJ tho n-stilf. of th.> TKLtlP ami SIMIP programs-. One issue 

to be invt'UiiMtcd was t.hr [ypv ot (li<itr ihution of item p«<)l difficultv 

puramoters tnat y.eldrd tholoast bias and standard error of ability / 

uuisM^n'Jc'h'' ^-'^ ^^''^'^^ ^° Another important 

of ^J^M^.T ^'^V- "^^9^ Item pool was necessary to accomplish the goal 
dnn?tudP of'?hI' ^ ^^tination. Thirdly, a determination of the preferred 
this stnL Ic f 'i'^'y' parameter was desired. The fourth outcome of • 
;.1L ^ ^° ^^^'^^ "P°" approximate value of the acceptance 
.range program parameter which would provide ability estimates with the 
stSdy s.tandard error. These were the primary targets of the 

nf ^^^^•^e'^^^^y 9oa]s of the study included a comparison of the performance 
of actual, versus ideal item pools. Another secondary objective was to • 
compare tne results of the TREEIP and SIMIP programs^ In t ]s reg rd! 

Hn^J'rnf investigated. One pertained to how close the SIMIP 
e.t mates ofthe means and standard deviations of ability were to the 
E and »VaF(T) determined by the TREEIP. The Importance of this par- 
ticular concern related to how well the SIMIP analyses on larger item 

fr L^i^^K^'"^ ''f^'^" ^'^^ °" P^^'^^y questions of this study. 
It should be recalled that the motivation for development of the SIMIP 
program was to investioate the research questions of the study on larger 
Item pools than t^e TREEIP program would realistically accommodate. The 
second concern subsumed under comparison of the TREEIP and SIMIP programs 
was to decide whether or not 25 estimates of each ability by the SIMIP 
was an adequate number. Several analyses were run using the SIMIP oroararr 
?Srnp'°"R P°°lr ^^°'!' ^^hich data had already been obtained froS the 

n^pH Ivin/r'"^ l^'^ H^^^ °" ^^"^ P°°^^ holding all other variables 
fixed except the number of test administrations, data were obtained per- 
ta ning to the adequacy of the SIMIP estimates of the means and standard 
deviations. Anotner matter along this same line was investigated with 
'uns of the bl.nP on some of the larger pools. This was the question 
u wnether or not 20 iteiis was an adequate upper limit on the number of 
itei > adnirnstered by the tailored test. 

i_t._ei f'oo 1 S Id ;ie 

r-i- TRLLIP pronrar (propensi ty distribution technique) was used to 
ova uate t.-ie effects of varyincj the shape of the item pool difficulty 
vlistnou ion on ability esli-.ation. Four shapes of item pools were studied- 
rectangular, noma), bir:odal and skewed. The rectangular item pools were 
o.ta.nod .inply by selecting equally spaced items between 4-3.0 and -3 0 
HKluswe Trie normal item pools were constructed such that trie Items 
were e.^udi y spaced in probability. That is, the area between item posi- 
• on. -'^^/ept constant in tne range from +3.0 to -3.0 standard deviation 

1 . n ne non.a distribution. This procedure for producing the normally 
..i.tMou ed pools had the effect of selecting more items around the diffi'^ 
•.ulty .alu.- of zero and fewor items at the extremes. A similar procedure 
■ 1" >elrctin'j tin- item parameters for the bimodal pools as was 



^ 1 y 

ERIC - i 



-10- 



... ' ... ; ■ 

used for >olt'ctirig the uv-rm] pools. The negative half of the pool was 
centered aroun.l -.69.3 and the area under the normal distribution was used 
to place itentb around this point up to zero and down to -3.0. The same 
was true for the positive half of 'the pool. The reason +.693 were chosen 
as the two tnndes of the bimodal distribution was that, prior to the con- 
struction ot d hinodal pool, .693 had appeared promising as a stepsize 
value. Therefore, after the first item was administered at 0, the step- 
^.,ize of .693 would move the ability estimate out to one of the more dense 
recjion^ of the pool depending upon whether the examinee correctly or incor- 
rectly an:>./ered tne first itei.i. The skewetl item pool distribution of item 
parameters was constructed via a similar procedure to that for the normal 
and iiii.iodal pools. That is, the items divided the distribution into equal 
areas. Fur the skewed pool, tables of tiie Pearson Type III distribution 
were used. Tne pool constructed was positively skewed (skewness = .5). 
It ^nculd .^e noteJ that in the tables included in this report, a skewed 
distribution always indicates a positive skew. However, the results would 
generalize to negatively skewed pools. 

^esuU^ concerning the shape of the item pool distribution may be 
.eer. m Ta. les 1-6 for different combinations of values of the other var- 
iables, -owever. Tables 1 and 2 point out the more general trends of the 
Item distribution study. In Table 1 the comparisons of the normal and 
rectangular pools of 25 items are shown for only acceptance ranges of 0.1 
and 0.3 when paired with stepsizes of 0.5 and 0.7 respectively. These 
values of acceptance range and stepsize were chosen because they appeared 
to yield soi.ie of the least bias and least variance estimates. Specifically, 
the acceptance range of 0.1 was chosen to check whether the more dense 
Iter; parameters near the middle of the normal distribution would make the 
use of the sn-aller acceptance range desirable. 



Table 1 ' 
C(3r:par 1 son of TREEIP Results from 
Jb lLo:i! Rectangular and Normal Item Distributions 



Abi 1 i ty Level 

•**CC'..'ptatiCt . t. ;' !M . t.f ibuL ion 



0.1) 0.5 1.0 2.0 3.0 



'(•) V Mv) S L(-.) 5„ L(.) S., E(-) S^,, 



n.ODl o.qi;-: O.4/O 0.92/ 0.944 0.943 1.893 0.968 2.764 0.884 
-o.OO'i 0.9')1 O.S22 0.904 0.980 0.762 1.468 0.426 1.555 0.251 



O./M/ 0.430 0.824 0.91 1 0.893 1.986 0.984 2.933 0.773 
O.OOO 0.9'.>9 0.623 0.922 1.169 0.821 1.877 0.491 2.093 0.231 



' ■' "' f ' ■ •at'i' I, "i" ni-n.il d i tt i ,'iwt i on .ippoar-i to be 

f •'(. t.uK] J 1 .ir i In- ■!; ,tr liuit loM in almost, all cases. Except 
t'''' '• i.cp' iint.c J\^u'\^• •i.it.ii ,i' (■).'> .tnd ] .:) ability levels, eitnet' 



ERIC 



^ 1.256 0.373 2.267 0.693 2.840 0 543 

1.245 0.876 2.281 0.688 2 852 0 525 

.008 0.677 2.138 0.805 3.111 0 566 

K2.S2 0.858 2.257 0.670 2.801 0.561 



-11- 



s 



">c from the true -j or the standard devia- 
It IS interesting to note that even the estimates 
00(1 for' the normally distributed pool as for 

tH;;'noZl pool! ''''''' estimation 



Table 2 

••'^''o and Standard Deviations 
'■^'h'lf' on Various Shaped item Pools 



Ability Level 



I 



'■^■^^ \ E(u) S^^ E(0) 



* 'fMI 

i.' i 1 i ti us 



1^ with 61 items with the stepsize and accep- 
v.rameters set at 0.693.and 0.30 respectively. 



■ ■> f i » , 



. were presented since the results are 

0 L-xcept for the skewed pool. 



-t^t>d values and standard deviations from the 
'■octangular, and positively .skewed pools 
<ty-one items. The stepsize was fixed at'o.693, 

'leld at 0.30 for all runs. Again the rec- 
^^^•r- overall than did the other shapes of item 
^'r true abilities zero and one, the standard 
'■'^^^ 3S the bias of the estimates, was 
•>incj the rectangular pool. At the ability levels 
!ir pool yielded estimates with less bias 
a-iior standard deviations than the other 



:M : 



•;• TKLLIP would have been the same for 
.Tntinuun when the pools were symmetric 
• i.ijos of ability were run for the normal," 
•! •. Howovor, for the skewed pool containing 
!^'lIry values of -1, -2, and -3 were run 

were indicated in Table 2. The results 
• ? ' ' -1 .189 and So - 0.836. For -2, the 
■ ' ■ the E( •) -2.935 and Sn = 0.577. 
' iiool as being better suited'for ability 
since It contained more items around 
•• '■•Itrr- than the rectangular" pool. 



Iteiii Pool Sue 



The criteria for judging how large an item pool was needed for good 
ability estimation using the tailored testing procedure were again the 
bias and standard error of ability estimates. The results of the simu- 
lations using both the TREtlP and SIMIP program*; hav^! been condensed, 
and the general trend has been illustrated in Figure 2. The values of 
the and S.j which have been plotted for item pools of size 9, 13, 
25, 31, and 61 were obtained from the TREEIP. Each of these pools had 
a rectangular distribution of item difficulty parameters. The means and 
standard deviations of ability estimates on the SIMIP runs on VCIPL and 
LTIPL (described earlier) have been included in the plots of Figure 2. 
Each analysis represented in this figure had o set equal to 1.0, the step- 
size fixed at 0.693, and acceptance range equal to 0.30. 

The top graph of Figure 2 illustrates that as item pool size reaches 
61 for this particular set of analyses, the E(8) is equal to e. The bias 
of the ability estimates is essentially zero. The bottom graph of Figure 
2 shows that as item pool size increases, the standard error decreases. 
V.hile these plots should be considered as rough approximations of the 
relationship between item pool size and ability estimate bias and stan- 
dard error, the indication appears to be that with a uniform distribution 
of item difficulty, o = 1 , and the prog^^m parameters equal to the values 
used here, one could ^\\)ect very little bias and a standard error of about 
0.3 with an item pool consisting of around 200 items. More will be presented 
on item pool size in the discussion section of this report. 



Steps i ze 

The results of the study of the preferred magnitude of the stepsize 
proqram parameter may be seen in Tables 3, 4, 5, and 7. Tables 3, 4, and 
5 give the E(o) and S,, from TREEIP analyses of 0 = 0, 1, 2, and 3 using 
itet.. pools of size 9, 13, 25, 31, and 61 for the rectangular, normal, and 
Mnoddl distributions of item difficulty parameters, respectively. Table 
7 presents tho results of the SIMIP analyses on the ETIPL item pool for 

= -3, -1, 0, 1, 2, and 3. Negative values are not shown in Tables 
3, 4, jnd n since the results of the TREEIP on the pools used are the 
sai-.o as for the positive o values except for the change of sign. ■ This 
v.as expected s'ince the item pool distributions of item difficulty are 
syiui.otric around zero. The acceptance range for all analyses for Tables 
3, 4, and b was 0.30. For the SIMIP analyses of the ETIPL, a substantially 
larqer- i ten; [)Ool , a small (>r acceptance ranf,9, 0.25, was used as is noted 
at the bottom of Table 7. Another variable recorded in Table 7 is the 
i.'ean nui:'her of items adn;i ni stered for the 25 tests simulated by the SIMIP 
for each ability level. The niaximuni number of items per simulated test 
was 20 for these SIMIP analyses. 

In yoMtMal , results presented in Tables 3, 4, and 5 suugest that 
stepsizes between O.b and 1.0 give fairly unbiased estimates, and also 
';ave ttie s::ial](>st standard errors, larger stepsizes tend to have a posi- 
tive ; iai and Irirger standard error'--,. Trom several graphs like the ones 
: '-es'-'ntf-l in Tifjurt- 3, f.'ic stp[)^,ize value of 0.693 appears to l)e the best 



-13- 



Figure 2 

latlonship Getween Item Pool Size 
and the E(<i) and S, 



0 



t \ i 



'••1 

1 



KM 



/ 



] .ur 



bl 7? 
i tor: Pool Si ze 
tepsize =■ 0.693 



180 



Acceptance Range = 0.30 



-14- 



Table 3 

Expected Values and Standard Deviations 
from TREEIP on Rectangular Item Pools 
Varying Pool Size, Stepsize and Ability Level 



Ability Level 



Pool Size Stepsize 0 12 3 



E(9) S, E(.) S, E(e) E(o) S 



0.6 -0.000 0.645 0.405 0.603 0.709 0.482 0.877 0.335 

9 0.693 -0.001 1.025 0.756 1.113 1.593 1.217 2.388 1.139 

1.0 -0.001 1.155 0.821 1.213 1.685 1.298 2.548 1.286 

1.5 -0.001 1.182 0.934 1.268 1.966 1.439 3.016 1.423 

0.5 -0.001 0.765 0.655 0.937 1.577 1.219 2.599 1.201 

13 0.693 -0.001 0.976 0.733 1.056 1.587 1.217 2.454 1.168 

1.0 -0.001 1.187 1.037 1.150 1.995 1.085 2.822 1.005 

1.5 -0.006 1.125 0.899 1.249 1.960 1.463 3.045 1.424 

0.25 -0.001 0.547 0.584 0.809 1.606 1.200 2.783 1.190 

0.6 -0.001 0.736 0.857 0.842 1.933 1.000 2.964 0.809 

0.6 0.001 0.744 0.896 0.888 1.986 1.004 2.955 0.788 

0.693 -0.013 0.786 0.910 0.892 1.984 0.980 2.925 0.765 

25 0.8 -0.013 0.801 0.931 0.934 2.047 1.042 3.045 0.845 

0.9 -0.001 0.845 0.996 0.895 2.061 0.972 2.996 0.784 

1.0 -0.001 0.829 0.990 0.901 2.099 1.036 3.135 0.867 

1.5 -0.001 0.972 1.109 1.086 2.318 1.22T 3.389 1.040 

1.7 -0.001 1.473 1.329 1.417 2.477 1.116 3.143 0.614 

2.0 -0.001 1.551 1.389-1.553 2.673 1.348 3.535 0.846 

3.0 -0.001 1.555 1.361 1.741 2.863 1.930 4.248 1.750 

0.5 0.004 0.726 0.949 0.788 2.022 0.902 3.018 0.725 

31 0.693 -0.003 0.742 0.973 0.826 2.068 0.907 2.997 0.672 

1.0 -0.003 0.776 1.009 0.866 2.140 0.995 3.183 0.817 

1.5 -0.005 0.925 1.116 1.050 2.002 1.388 3.382 1.023 

O.b -n.001 0.598 0.989 0.657 2.116 0.804 3.133 0.593 

61 0.693 -0.001 0.610 1.008 0.677 2.138 0.805 3.111 0.566 

1.0 -0.000 0.641 1.039 0.745 2.229 0.915 3.239 0.689 

1.5 -0.001 0.734 1.100 0.894 3.560 0.899 3.560 0.899 



Note. Acceptance Ranne = 0.30 



-15- 



Table 4 

Lxpected Values and Standard Deviations 
from TREElP.on Normal Item Pools 
Varyincj Pool Size, Stepsize and Ability Level 



Ability Level 



f'ooi rize Stepsize o 1 



h E(o) S^ E(6) Sg E(6) Sq 



0.5 -0.001 1.018 0.848 0.847 1.318 0.491 1.463 0.226 
U.693 -0.001 1.098 0.960 0.966 1.601 0.655 1.898 0.382 
-O.UOl 1.26^ 0.880 1.084 1.641 0.63? 1.877 0.334 



1.0 



1.5 0.000 1.500 0.693 1.330 1.142 0.972 1.358 0^638 

0.5 -^0.001 1.028 1.062 0.866 1.697 0.514 1 .922 0.237 

0-693 -0.001 1.101 1.002 0.942 1.648 0.628 1.932 0.358 

1.0 -0.000 1.273 1.146 1.020 1.760 0.548 1.946 0.231 

1.6 -0.001 1.439 1.272 1.282 2.2,19 1 .188 3.031 1.258 

0.25 -0.001 0.847 1.110 0.858 1.969 0.576 2.210 0.408 

0.5 -0.001 0.891 1.184 0.837 2.016 0.572 2.359 0.278 

0.6 -0.001 0.980 1.203 0.847 1.965 0.528 2.263 0.266 

0.693 -0.000 0.956 1.174 0.811 1.871 0.482 2.079 0.227 

0-8 -0.001 1.009 1.234 0.871 2.004 0.539 2.292 0.253 

0.9 -0.001 1.052 1.290 0.964 2.223 0.784 2.818 0.658 

1.0 -0.001 1.055 1.295 0.979 2.263 0.858 2.949 0.820 

1.5 -0.001 1.327 1.384 1.186 2.394 1.070 3.167 1.114 

1.7 -0.001 1.536 1.521 1.363 2.549 0.968 3.047 0.628 
2.0 -0.001 1.738 1.653 1.600 2:845 1.248 3.492 0.884 
3.0 -0.001 1.792 1.627 1.749 2.928 1.814 4.045 1.883 

0.5 -0.000 0.869 1.218 0.805 2.046 0.557 2.385 0.277 

0.693 -0.001 0.964 1.268 0.880.2.192 0.734 2.778 0.607 

1.0 -0.001 1.018 1.323 0.951 2.300 0.823 2.969 0.787 

1.5 -0.001 1.301 1.404 1.155 2.410 1.043 3.176 1.092 

0.5 -0.000 0.753 1.201 0.797 2.132 0.541 2.465 0.254 

0.69j -0.000 0.866 1.256 0.873 2.267 0.693 2.840 0.543 

1.0 -0.000 0.915 1.298 0.944 2.361 0.774 3.010 0.711 

1.5 -0.000 1.232 1.399 1.141 2.473 1.004 3.227 1.044 



\cci'[)t.anct.' Pwjnnp - 0.30 



o 

ERIC 



-16- 



Table 5 

Expected Values and Standard Deviations 
from TREEIP on Bimodal Item Pools 
Varyilng Pool Size, Stepsize and Ability Level 



Ability Level 



Pool Size Stepsize 0^1 2 3 



L(o) Sg E(o)' Sg E(e) Sq E(e) S 



0.5 -0.004 1.020 0.231 0.443 1.312 0.495 1.473 0.245 

9 0.693 -0.004 1.095 0.951 0.968 1.^01 0.666 1.903 0.383 

1.0 -0.001 1.264 1.036 1.042 1.639 0.628 1.876 0.331 

1.5 -0.001 1.442 1.216 1.326 2.187 1.252 3.027 1.291 

0.5 -0.001 1.006 1.009 0.903 1.671 0.579 1.917 0.275 

13 0.693 -0.001 1.104 1.001 0.945 1.647 0.630 1.932 0.358 

1.0 -0.000 1.267 1.143 1.011 1.754 0.551 1.946 0.238 

1.5 -0.000 1.436 1.274 1.276 2.217 1 .181 3.029 1 .252 

0.25 -0.000 0.920 1.102 0.855 2.001 0.623 ^a, 264 0.421 

0.5 -0.001 0.870 1.152 0.867 2.024 0.594 2.373 278 

0.6 .0.001 0.951' 1.173 0.875 1.976 0.536 2.271 0.242 

0.693 -0.001 0.964 1.207 0.933 2.174 0.768 2.774 0.612 

25 0.8 -0.001 0.953 1.183 0.887 2.020 0.589 2.335 0.272 

0.9 -0.002 1.025 1.260 0.994 2.246 0.780 2.833 0.631 

1.0 -0.001 1.017 1.257 1.002 2.280 0.860 2.969 0.791 

1.5 -0.001 1.294 1.350 1 .192 2.396 0^064 3.176 1.091 

1.7 0.002 1.491 1.483 1 .362. 2.543/1.959 3.047 0.612 

2.0 -0.000 1.717 1.609 1.592 2.837 1.235 3.485 0.871 

3.0 -0.001 1.761 1.601 1.763 2,953 1.803 4.070 1.857 

0.5 -0.000 0.796 1.145 0.816 2.060 0.621 2.476 0.406 

31 0.693 -0.000 0.924 1.229 0.912 2.218 0.741 2.814 0.585 

1.0 -0.000 0.957 1.262 0.956 2.298 0.832 3.004 0.758 

1.5 -0.002 0.968 1.284 1.049 2.446 1.080 3.338 1.015 

0.5 0.006 0.726 1.174 0.800 2.246 0.692 2.903 0.572 

61 0.693 -0.000 0.857 1.245 0.876 2.281 0.688 2.852 0.525 

1.0 0.033 0.867 1.221 0.897 2.356 0.820 3.107 0.714 

1.5 0.185 1.128 1.249 1.003 2.497 1.050 3.407 0.949 



Acceptance Range = 0.30 



-17- 



TabTe~6 

Means and Standard Deviations 
from SIMIP on a Birrodal and 
Skewed Item Pool Varying 
Number of Test Administrations 



Shape of Pool 



Number of Tests . ^ , 

Administered Bimodal Skewed 



X 


^0 


^0 


^0 


2.207 


0.627 


2.193 


0.622 


2.242 


0.634 


2.225 


0.627 


2.262 


0.645 


2.216 


0.603 



Note. All runs made with 20 item upper limit, steps ize 
= .693, and acceptance range =0.30. The true 
ability was set at 2.0. Both the pools had 61 
i tenis . 



tepsize 



Table 7 

Means and Standard Deviations 
from SMIP on ETIPL Item Pool 
Varying Stepsize 



Ability Level 







-3 


-2 


-1 


0 


1 


2 


3 


.1 


X 

5 

Mm* 


-2.886 
0.715 
13.04 


-2.145 

0.728 
15.88 


-0.992 
0.486 
19.24 


-0.050 
0.534 
20.00 


1.135 
0.502 
20.00 


1 .991 
0.627 
19.84 


3.331 
0.788 
18.40 




X 

1 

Mn 1 * 


-2.779 
0.491 
12.24 


-?.230 
0.681 
13.96 


-1 .132 
0.550 
19.68 


0.129 
0.461 
20.00 


0.952 
0.374 
20.00 


2.009 
0.515 
19.76 


2.972 
0.857 
18.24 


,.i 


X 

s.. 

Mm* 


-3.157 
0.652 
10.04 


-2.139 
0.645 
14.48 


-1 .134 
0.800 
18.56 


0.064 
0.503 
20.00 


1.018 
■ 0.363 
19.92 


2.055 
0.516 
19.56 


3.213 
0.844 
16.08 




f 


-3.168 
0.611 
9.56 


-2.250 
0.782 
17.04 


-1 .052 
0.547 
19.24 


0.001 
0.518 
20.00 


1 .070 
0.444 
20.00 


1 .987 
0.531 
19.48 


2.910 
0.554 
18.12 


. s 


Mn i * 


-^.76? 
0.539 
9.20 


-2.096 
0.619 
14.72 


-1.122 
0.700 
18.12 


-0.070 
0.539 
20.00 


1 .136 
0.562 
20.00 


2.076 
0.548 
19.40 


3.053 
0.718 
16.28 



All runs i.idde with 25 administrations per abil 
lii-'it, and acceptance range = .25. 
i;>ean number of items administered. 



ity level , 20 item upper 



-18- 



Table 7 (Cont.) 
Means and Standard Deviations 
from SIMIP on ETIPL Item Pool 
Varying Stepsize 



Ability Level 
Stpp^i*'^' 







-3 


-2 


-1 


C 


1 


2 


3 




- 

X.. 


-3.061 


-2.175 


-1 .026 


-0.065 


1 .029 


1 .950 


2.913 


./ 


Sm 


0.561 


0.460 


0.573 


0.469 


0.516 


0.696 


0.533 




Mni* 


7.80 


13.12 


18.92 


20.00 


20.00 


19.16 


15.92 




Xo 


-3.134 


-2.271 


-1.241 


0.094 


0.959 


2.029 


3.310 


.9 


So 


0.499 


0.790 


0.898 


0.419 


0.380 


0.531 


0.799 




Mni* 


5.92 


11 .40 


16.96 


20.00 


19.84 


19.20 


13.28 




Xij 


-3.739 


-2.501 


-1.389 


0.101 


1.035 


2.437 


3.239 


1.5 


S.J 


0.876 


0.961 


0.910 


0.598 


0.792 


1.118 


1.010 




Mni* 


5.32 


10.80 


18.04 


20.00 


19.32 


16.16 


12.72 






-3.683 


-2.972 


-1 .482 


-0.329 


1.100 


2.032 


'3.631 


2.0 




0.514 


1 .044 


1.194 


1.175 


0.450 


0.913 


1 .345 




Mni* 


4.24 


8.76 


16.56 


18.56 


19.96 


18.48 


13.36 




Ij 


-4.530 


-2.942 


-1.751 


-0.042 


1.230 


2.511 


4.471 


3.0 


S,. 


1.591 


1.494 


1.916.. 


0.465 


1.117 


1.556 


1.519 




Mni* 


5.04 


10.68 


16.52 


20.00 


19.28 


17.04 


8.60 



Note . All runs made with 25 administrations per ability level, .20 item upper 

limit, and acceptance range = .25. 
*Mni = mean number of items administered 



overall compromise value which achieves less bias while holding the stan- 
dard error down. Figure 3 shows the E(6) and Sq for the 31 item rectangular 
normal and biniodal pools when 0 =1.0 and the acceptance range equals 
0.30 for var^ojs stcpsizes. 

Table 7, which reports the results of the SIMIP on the ETIPL pool, 
[)resents information that suggests a stepsize between 0.4 and 0.7 yields 
less bias and a smaller standard error. U should be recalled that the 
SIMIP is subject to sample variation, but in general, the results seem 
to suggest that a stepsize of about 0.7 is appropriate. However, a trend 
which snould be investigated further is that larger item pools seem to 
do better with snialler stepsizes and conversely. 



Accoptance 



The results of the acceptance range study are given in Tables 8, 
9, and 10. Table 8 presents the C(- ) and Sy for stepsizes 0.5 , 0.693 , 
1.0, and 1.6; acceptancG ranges 0.1, 0.2, 0.3, and 0.4; and ability levels 
'^.O, 1.0, 2.0, and 3.0 fron TRELIP analyses. All of the results in Table 



Figure 3 
Relationship Between Steps ize 
and the E(8) and S, 




Key 

— rectangular 

normal 

bi modal 

• • • • true e (top figure) 

6=1.0, pool size=31, acc. range'^.a 




1.0 

21 



8 are based on the 25 Hen rectangular pool. From Table 8 it can be seen 
that in most cases, as the acceptance range increases, the standard devia- 
tion decreases. This is a reasonable result since more items are available 
for administration with a larger acceptance range. However, there is 
also a trend of increased bias in estimate as the acceptance range increases, 
particularly at the higher ability levels and for the larger stepsizes. 

Ta\)le^9 shows the results of the SIM'lP on the VCIPL pool using 25 
test administrations per ability level; 20 item upper limit; stepsize 
• .693; and *• = -3, -2, -1, 0, 1, 2, and 3. The mean number of items 
is also indicated. These results indicate that an acceptance range of 
0.30 is probably the best compromise value for minimizing bias and stan- 
dard error of ability estimates across the range of 6. Table 10 shows 
the results of the SIMIP on the tTlPL pool using 25 test administrations 
per ability leveU 40 item upper limit; stepsize = .693; and e = -3, -2, 
-1 » 0, 1, 2, and 3. Again, the mean number of items is indicated. These 
results on LTIPL are somewhat more ambiguous although the extreme accep- 
tance range values are clearly inferior to the more moderate values of 

Table 8 

Expected Values and Standard Deviations 
from TREEIP on 25 Item Rectangular Pool 
by Step Size and Acceptance Range 



Stepsize 



Ability Acceptance — — - — ^ 

Level Range 0.5 0.693 1.0 1.5 



E(o) S^ E(f3) Sq E(e) 5^ E(8) Sg 



.1 -0.00 0.92 -0.00 0.84 -0.00 1.04 -0.01 1.07 

.2 -0.00 0.81 -0.02 1.01 -0.00 0.88 -0.00 1.06 

0.0 .3 -0.00 0.74 -0.01 0.79 -0.00 0.83 -0.00 0.97 

.4 -0.00 0.76 -0.01 0.78 -0.00 0.81 -0.00 0.93 

.1 0.94 0.94 0.55 0.80 1.081.06 0.891.23 

.? 0.89 0.87 1.00 1.04 1.00 0.94 0.90 1.22 

1.0 .3 0.86 0.84 0.91 0.89 0.99 0.90 1.11 1.09 

.4 0.94 0.81 0.96 0.83 1.00 0.89 1.10 1.07 

.1 1.89 0.97 0.97 0.66 2.09 1.03 1.99 1.45 

.2 1.92 0.99 1.88 0.97 2.08 1.04 2.00 1.45 

^.0 .3 1.93 1.00 1.98 0.98 2.10 1.04 2.32 1.22 

.4 2.010.92 2.03 0.91 2.12 1.02 2.33 1.21 

.1 2.76 0.88 1.21 0.46 2.93 0.93 3.09 1.39 

.? 2.89 0.85 2.74 0.89 3.08 0.91 3.10 1.39 

3.0 .3 2.96 0.81 2.92 0.76 3.14 0.87 3.39 1.04 

.4 3.00 0.74 2.97 0.72 3.16 0.84 3.42 1.01 



Or; 



Table 9 

Means and Standard Deviations 
from SIM IP on VCIPL Item Pool 
Var-yiny Acceptance Range 









Altility Level 








KarKji 




-3 


-2 


-1 


0 




2 


3 


.] 


X".. 

s 


-1.938 
0.430 
3.64 


-1.713 
0.794 
5.56 


-0.994 

r\ c 7 o 
U.b73 

6.84 


-0.491 
0.810 
8.52 


1.101 
0.976 
11.12 


1.873 
0.676 
9.72 


2.913 
0.447 
6.24 


o 


Mni* 


-2.747 
0.790 
6.96 


-2.133 
0.520 
8.88 


-1.193 
0 779 
12.56 


-0.152 
14.44 


1.208 
0.739 
15.44 


2.268 
0.686 
10.56 


2.889 
0.540 
7.20 


.3 


T 

r 

:> 

Mni* 


-2.955 
0.823 
7.00 


-2.085 
0.555 
10.00 


-1.311 
0.943 
11.24 


-0.021 
0.385 
16.96 


1.025 
0.578 
17.24 


2 229 
0.581 
12.96 


3.109 
0.510 
7.68 


.4 


A.. 

Mni* 


-3.171 
0.690 
7.08 


-2.404 
0.538 
8.08 


-1 .346 
0.681 
14.60 


-0.007 
0.344 
18.44 


0.869 
0.399 
19.72 


2.234 
0.775 
14.64 


2.950 
0.579 
9.64 


. b 


s, 

rini* 


-3.157 
0.606 
8.16 


-2.242 
0.791 
11.40 


-1.051 
0.619 
17.04 


0.160 
0.755 
18.64 


0.941 
0.546 
19.28 


2.340 
0.780 
14. '32 


3.117 
0.497 
9.48 



Note. 



iter' jpper linit, and steps ize = .693. 
Mm = mean number of iteus administered 



to .4. In cases such as this, one should consider a combination of 
t'H:; density of tiie item pool across the ranqe of 0 and whether a parti- 
-jlar • ranc]- s-iould be estiniated more precisely than others, in order 
V) .:-ci.Je on toe oest acceptance range value. Decisions regarding the 
■ -...t valjL' or progran parameters cannot be made independent of consider- 
,jc-. do t'le size anri s iape of tne item pool to be used. 



^^ecundcio results include the comparison of the performance of actual 
v'>r>.. Ideal i te- • pools previously discussed. Table 11 shows this compari^ 
.■.jri jnu overall, thf ideal pool did not perform much better than the 
u i I PL ;. ()ol . 

■\t:ut.';er comparison wa^ or t-ie SIMIP and TRLLIP programs on the same 
..ooi> u-oin.; iiie ,ame progra:- oarametcr values. 3y : coking at Table 2 
hH ,a..lt. 6, on..: see L-nt the SIMIP did a reasonably good job of 



•22- 



Table 10 " 
Means and Standard Deviations 
tron; SIMIP on LTIPL Item Pool 
Varying Acceptance Range 



Abi 1 ity Level 



Range 




-3 


-2 


-1 


0 


1 


2 


3 


.1 


X.. • 

r • 
J 

Mni* 


-2.528 
0.559 
6.64 


-2.200 
0.667 
9.24 


-1 .174 
0.700 
17.16 


-0.111 
0.569 
26.60 


0.974 
0.903 
27.80 


2.001 
0.471 
16.44 


3.299 
0.781 
11.16 




J 
S.. 

Mn 1 * 


-2.989 
0.491 
7.20 


-2.159 
0. 559 
14.52 


-1.144 
0.731 
22.40 


-0.016 
0.332 
31.60 


0.926 
0.362 
33.60 


2.152 
0.464 
22.36 


3.451 
0 .765 
13.40 


. i 


X.. 

S' 


-3.103 
0.576 
7.50 


-2.475 
0.594 
1.:.40 


-1.162 
0.630 
27.96 


0.003 
0.239 
36.72 


1.016 
0.401 
37.88 


2.161 
0.410 
25.32 


3.024 
0.747 
18.16 




X., 


-:.064 
0.615 
10.20 


-2.359 
0.315 
13.00 


-1.121 
0.582 
31 .40 


-C.094 
0.261 
39.00 


1.043 
0.316 
39.36 


2.073 
0.336 
31.52 


3.054 
0.520 
20.12 


. 5 


/ 

* . \ 


-3.378 
0.716 
10.24 


-2.455 
0.715 
18.48 


-1 .088 
0.510 
3E<.08 


0.031 
0.394 
39.80 


0.993 
0.356 
33.48 


1.920 
0.389 
35.12 


3.195 
0.584 
20.75 



.ot;t7. All runs made with 25 administrations per ability level., 40 

i toir upper Unit, and steps ize = .693. 
*'Vi = ; -:an nui/.ber of iteris adninistered 



.iDprox ir\]t i ng tne TRCEIP results at o = 2 for the biniodal and skewed pools. 
Also, fror Table 6, it can be seen that i ncreasi ng the number of tests 
ddrii p. istered !j> the SIMIP did not' drai'iati cal ly change the means and stan- 
•iard deviations. Therefore, 25 administrations seemed adequate. 

Hn^l]ly., by comparintj cells of Tables 7 and 10, one can see that 
1 /)(- ftMS 1 n(j t*ie I'laxii'iun nui:'ber of items administered from 20 to 40 does 
!u)t ^ jj'Udnti d I ly change the ("".eans and standard deviations froin the SIMIP. 
'•ir, ..o; 'Pari son is not exact because tne acceptance range of 0.25 used 
t )r- dndlyse-> in Tdblo 7 does not precisely equal the value of 0.2 or 0.3 
t )f .iccept.inrc range in Table 10. Neither is the stepsize of 0.7 in Table 
/ I'xdctly et|u<il to 0.693 used in Table 10. However, the values seemed 
I io,<' fuiouuM to .!"idk(> ,3 compdrison, and the result of this coniparison seemed 
ti" iri!i(:dt<? that 20 i to; .j as upper limit was adequate. Note that the 
■■'.v,\\\ nj!ib(M' of it(M;s rocofded in both tables illustrated that the proce- 
;-.jf;' dpptoa^Mod tne up[)er linit in the middle range of '3. 



■ Tabic 11 ^ 
Moans and Standard Deviations from 
SIMIP on ETIPL Item Pool' and Comparable 
Ideal I.teni.Pool 



Abi 1 i ty Level 



•3 .. -2 -1 •■ 0 1 



j-J'^''- -^^1^1 -'.1^'^ -l:026 -0.065 1.029 1.950 2.913 

•^^'^^ O-^^^l 0.460 • 0.573 0.469 0.516 0.696 0.533 



Innf ^ "n*?.^? '^'^^^ -^'^^^ -'O-Ol^ 1.148 2.222 3.070 

^ 0.703 0.652 0.462 0.787 0.718 0.460 



'^te. An runs riade with 25 administrations per ability level 20 

. item upuer limit, s.tepsize = 0.70, and acceptance range *= 0.25. 



Discussion 

It should be recaUed that the basic emphasis of this study was to 
investigate the operational characteristics of. a one-parameter tailored 
testing procedure when .the item pool 'attributes (shape and size) and 
the prograi: parameters (stepsize and acceptance range) were varied In 
bO doing, suggestions regarding; the most preferred item pool and program 
.'araneter value> were found based upon analyses of the tailored testing 
:^rocedure-s ar 1 1 ity estimate bias and standard error at various points 
.Ho-nj tne^a'Mlity continuum. This strategy for inves.tigating bias and 
standard^rror was motivated by the need to detennine these values at 
several levels • across the continuum, since overall efforts on the 
project were directed toward developing a criterion-referenced tailored 

cr-u.^rion roferenced^testing, it is essential to identify effects 
ft .UM|it> . ..ti.idt.o inas and standard error -on decisions made at several 
.v'lnts div'pg tn..' jl) i 1 i t y seal". Trie rest^ar-ch presented here, which wns 
-''f;"'- i'^" '>!'tii-.] it.M-' :hu)] t f t r j 'mj and [)roqram paramt'te.-r 



•. • ; t 



1 I'M, 1 : , ■ j t r_ 

I i ' 1 I 1 • , ! ^ ■ 

• I 

.■. i *! ' • 1 :. ! f 

' f . ' ■ ,' i )':■!■, J r .| ". 



' 1 ■ " ■ 'i I ' i;.Ul -rSCfV 



■no iii'f)-jr- 



' "•■ I' ■ ' (Mill'.'. Sf..) 1 . 
■ ^" "'• '■' "i-t i'- •■ ran«io;, it is i:;H)(.n- 1 !',t 



ERIC 



2^ 



-24- 



the continuum. In this regard, one should view the estimates of true 
ability +3.0 as understandably limited, in as much as the item pools did 
not have any items beyond difficulty +3.0. For best estimation of ability, 
the pool should have a dense uniform distribution of items around the 
ability level to be estimated. 

Item Pool Size 

The methods employed for the investigationoof the effects of item 
pool size on the operation of the one-parameter maximum likelihood tailored 
testing procedure were simulations; but theoretical methods have also been 
proposed. Lord (1970) suggested a formula for the number of items required 
for a fixed stepsize procedure (selecting items more difficult by the 
stepsize when correct responses were given and vice versa). The formula 
is 



where +R ib the range of item difficulties desired, d is the stepsize 
and a subiMultiple of R, and n is the maximum number of items to be admin 
istered. For example, if R were plus three to minus three, d were set 
at 0.5, and n were twenty, the formula would give 



With this set of values, 119 items would be required if the exact item 
requested were to be available. 

This formula does not directly apply to some tailored testing proce- 
dures which use a variable rather than a fixed stepsize. Also, most 
testing procedures allow administration of slightly discrepant items from 
those requested by the procedure (the acceptance range specified how dis- 
crepant). Procedures using a variable stepsize tend to require more items 
because, as the procedures converge to an ability estimate, the stepsize 
in effect tiecomes smaller and smaller. Allowing items to be administered 
..':ic" du^O'- slightly from the requested item compensates to an extent 
^cr- v^'' in.roise in number of items caused by the variable stepsize. 
Anotnc>r lintatiori of the fomula is that several tailored testing procedures 
adi.i'nue'- it'jns until a specified precision is reached instead of using 
.1 P'-est't niaximum number of items as a stopping rule. 

Another tneoroticdl method ot ostimating how large an item pool should 
. '.' (llitt, 197b) IS to dett'ninne the number of items required to reach 
.1 sjh.'ciHed precision of ability estimation, given that equally spaced, 
it-rft- tly d 1 scrii-!! nat i ru) items arc available. With these ideal or optimal 
... 1 r( ui-stances , the precision of an ability estimate is equal to the difference 
iit?tweeM adjacent items, fdr example, an item pool with seven equally 
spaced items from -3.0 to +-3.0 would classify examinees into categories 
1.0 Stale unit apart, fhe nuniber of item responses required to make the 
■ 1 as ■, 1 1 1 ( at 1 on woul d Itf 



N = (1 + R/d) (n - R/2d) 



(2) 



119 = (1 + 3.0/0.5) (20 - 3.0/(2 X 0.5)). 



(3) 



l(><j..n 



(3) 




-25- 



vvher*^ n the su-e of the I'toir: pool, since 2*^ is' the number of branches 
in the troe diagram after k items are administered. By specifying the 
precision desired, e, the ri n i mum- i torn pool size can be determined by 
the range of ability, R, divided by e, plus one. 

< 

M iLtl (4) 

i'le ninii ui ■ numuer ot items administered to classify all ability levels 
in tiie tailored testing situation is 



k = log^I- + 1] . (5) 

Some resultb ootained by the application of the formulas based on n 
trie theoretical I'-.ettiod for estimating the number of items needed in a \ 
pool, given tne precision desired, have be^ indicated in Table 12. The 
requirements for pool size were computed for the range of ability, -3.5 
to +3.5, given the desired classification interval size. As has been* / 
pointed out, these results are for a rectangular pool of hypothetical 
iten's with perfect discrimination and zero guessing probabilities. With 
tnese resf^ictions , the item pool sizes shown must be regarded as lower 
Ill-its. The ripiruni session length indicates the fewest number of items 
that would have to be administered in order to classify an ability level 
within the capabilities of the item pool. These also are based on hypo- 
thetical ly perfect iten.s and item pool-;, and should be considered as lower 
limits. The values in the column labelled simulated length are the number 
of Items required to reach a best estimate using the most likely response 
pattern simulation. All results in this column are based on e = 0 0 



Table 12 
Minimum Item Pool Requirements 
tor d Rectangular Idealized Pool Given 
Classification Interval and Ability Range 





Cla^si t ic ition 


Pool 


Mi nimum 


Simul ated 




Interval Si/e 


Si ze 


Session Length 


Length* 






IS 


3.9 


2 






29 


4.9 


4 




0.] 


bS 


5.8 


8 






113 


6.B 


8 




OA) \] 




7 . 8 


7 













*Vjty. Muint)!." ot itfiiv, i.lipir; i ^tered closest appr'oxiiiiation of 
volut^ witnin c 1 i t i cat i on interval. 



j/'i:^Z,u:ln''.u7 .^^^^^'"^ I'^'^gth ,s less than the minimum 

' ' <^f^'lity level. Setting the 



3't 



-26- 



stepsize equal to 0.693 tends to keep the process near the middle of the 
item pool, speeding un convergence for abilities near 0.0. If an ability 
of 3.0 had been used, the session length for classification interval .5 
would have been 6, well over the minimum predicted values. Thus, the mini 
mum session length refers to the number of items needed across the ability 
range, and under specified circumstances fewer items may be required. 

These results using simulated tests have been compared to actual 
tailored testing convergence plots and found to be fairly good approxi- 
mations (Reckase, 1976). One observation of importance is that, from 
convergence olots, it can be seen that giving too many easy items causes 
bias in ability estimation. Reckase (1975) has discussed this effect 
in detai 1 . 



Steps i 

The investigation of the stepsize program parameter suggests that 
tor tailored testing procedures using a fixed stepsize prior to having 
correct and incorrect responses in the examinee's response string, a value 
in the range of .5 to 1.0 is most apt to minimize ability estimate bias 
and standard error. To determine the precise stepsize value to use when 
setting up a tailored testing procedure, one should look carefully at 
the distribution of item difficulty of the particular item pool to be 
used. The testing procedure should select the first item from the middle 
of the pool. This item may not coincide with the most informative item 
for ■ = 0, since the median difficulty for the pool may not equal 0. 
^ne next step is to tentatively set the stepsize equal to 0.7 and deter- 
■ me whether items exist within the acceptance range at +1, +2, +3, and 
M stepsizes a.vay from the median difficulty item that the procedure 
administered first. The purpose here is to avoid setting the stepsize 
dt a value which will induce ability estimates during initial testing 
which will "fall through" the item pool (i.e. premature 'termination of 
testing wMen no items exist within plus or minus the acceptance range of 
the nihility estinate). If the item difficulty distribution is uniformly 
dense across the range of difficulty this will not pose much of a ^problem. 

A^nother consideration when setting the stepsize value is to make 
It small enough to assure that items exist within an acceptance range of 
M stepsizes away from the median difficulty item in the pool. This will 
make the .i- i n ii-ium number of items that would be administered equal to 5 
^^or- those w>i(j get all the items right or all the items wrong. Depending 
'on tho above consi der a tion\. , the stepsize value may be set lower or higher 
tfidn tno recommended 0.7. As can be seen, the item pool size and diffi- 
culty distribution, acceptance range, and stepsize interact in determining 
the adequacy of the testing procedure. 

The reason for includiri') (\693 as a potentially optimal stepsize in 
t'lis study was that wh(^n the ttrst Rasch procedure, using raw ability, was 
sot up at tno University of Missouri, a multiplicative stepsize equal to 
was used with fjoo(j results. When the procedure was changed to o()erate 
•jri lo',; <h>il)ty, an ddditivo st(>psize equal to 1 09^.2 seemed promising. 
This study suggests tfiat indeed logo2 - 0.693 was justifiably chosen for 
t'le stepsize in the on(>-[)ar-ai"eter tailored testing procedure. 



f 



-27- 



has been indicated in the discussion of stepsize, setting the 
vaIuo> of the program parameters (stepsize and acceptanci ranqe) should 

uiH ^ Item pool has a uniforin density of item diffi- 

u I t one lay set the acceptance range at a fairly low value (say 0 2) 

a difficulty contlnium, the accep?ance^ * 

amje .hould be set arge enough to avoid terminating the test due to 
• l'«ck of any i ten within an acceptance range of the ability estimate 

n aonora . an acceptance range equal to 0.3 appeared to satisfy the con- 

' J'k^':h"^-P'T'"'' termination of testing and also minimizing 
indu!.-d by adnnmstering inappropriate items. jL 

■Ho program parameter denoted acceptance range is equivallk to 
•P'-v--'yin9 a minimum item information cutoff. Table 13 indicates the 
^r1luC~ -^o-ation cutoffs for the acceptanceMngesIn esJigated 
0 tins report Many of the tailored testing systems presently in oper- 
; H'n . o.npu . the item information for each item in the pool gWen the 
.."esent abihty estimate. For the one-parameter model, the Information 

nc .on K maximized when the difficulty of the selecied item S 
' • t 1 y e. imate. For a discussion of information functions \ee 



Table 13 
Comparable Information Cutoffs 
for Acceptance Range Values 



^£e^^tj_n^ej?an2e Information Cutoff 

'\ .249 
'i .248 

.244 
.240 

•5 .235 



• ".pldnation for the larger standard deviation given by 

•• •• - -n un the rectangular pool at the more extreme values of the 
•• .-nt >„uum was suqqested by a close look at the development of 
•: <:i-:tnbution by the TREEIP for the various shaped item 

■■■ tv of t.Mo IRLLIP and the manner in which it developed the 

" ■ ' = !is'.r-i;,ution'. was that the standard deviation actually increased 

• •/^'■•'^i'^;' or levels resulted from items administered to more and 
• . a.iluy est, nates. This increase of the standard deviation 

J!M..ty ostimates staoilized for the smaller item pools as the paths 
or the "tree" terminated. For the larger'pools (especially 
V.-' ponK), the nandard deviation initially increased but as 



-28- 



branches were terminated the standard deviation came down. Figure 4 
illustrates this property of the TREEIP when it was run on the 61 item 
rectangular pool with o set equal to zero, stepsize equal to 0.693, and 
acceptance range equal to 0.30. This pattern of increasing standard 
deviation of ability estimates during the early formulation of the pro- 
pensity distribution was evident- for all shapes of the distributions of 
items in the pools. 

However, the patterns of convergence to the final standard deviations 
yielded by the TREEIP were different for the various shapes of item pools 
at different ability levels. Tables 1, 2, and 3 show a general tendency 
for the standard deviations of ability estimates of the true abilities 
zero and one to be larger for the normal and bimodal pools than for the 
rectangular pools. But for ability levels two and three, the standard 
deviations of ability estimates were generally larger for the rectangular 
pools than for the normal and bimodal pools. This trend was consistent 
across most of the TREEIP analyses. The explanation proposed was that, 
because more items were avaflable for administration to the more extreme 
levels of ability (i.e. e = 2 and e = 3) when the rectangular pool was 
used, the standard deviation of ability estimates was larger since the 
standard error was more accurately estimated. The standard deviations 
of the estimates from the normal and bimodal pools for these true ability 
levels were smaller, since paths or branches were often terminated because 
no items were available within the acceptance range of the estimated abilities 
In short, when fewer items Jo'ere in the pool around a particular true ability, 
there were fewer paths allowed to develop in the propensity distribution 
due to the stopping rules. Therefore, the standard deviation of ability 
estimates at that particular level was an underestimate. A logical check 
for this phenomenon was the predictior. that when the acceptance range 
was made smaller, the drop in standard deviations for the more extreme 
ability levels would be more pronounced with the normal pool than for t4;e 
rectangular. This did appear to be the case. The point is that the smal lier 
standard deviations for ability levels 2 and 3 yielded by the TREEIP when 
normal or btmodal pools were used probably should not be weighted too 
heavily, as the tendency appears to be somewhat of an artifact of the 
procedure. The values obtained for the rectangular pools may well be 
more redl istic. 



S IM1P 

SIMIP was designed to score and adminiUer items in the manne>^ pre- 
viously described based on the rationale that this approach was a reasonable 
simulation of the behavior of an examinee when interacting with a tailored 
test. The pseudo examinee with some specified true ability was presented 
an item of a- erage difficulty from the pool, because, given we have no 
prior information about his ability, the best guess o^ an item appropriate 
for the examinee was one of average difficulty. Scoring of each item 
by determining the probability of a correct response using the examinee's 
•• in the one-parameter formula and then comparing this probability to a 
random number selected from a rectangular distribution between zero and 
one was deemed a reasonable simulation, assuming the one-parameter model 
was correct. Clearly, the larger the probability of a correct response 



-30- 



wds. the greater the chance was that the random number generated was 
less than or equal to the probability specified by the model of a correct 
response. However, there was ample provision for the reality that occasion- 
ally an examinee with adequate' ability to answer an Item correctly will 
still respond, incorrectly and conversely. While the probability of a 
correct response was computed using the examinee's true e, item selection 
procedures lised the fixed stepsize until correct and intorrect responses 
were present, and then selected items maximizing information for the 
estimated ability. This approach constituted the simulation of the inter- 
action between examinee and tailored test with respect to the SIMIP. 



Summary and Conclusions 

It should be kept in mind that tihis report focused primarily on 
program parameters and item pool attributes as they interacted with the 
one-parameter maximum likelihood tailored testing procedure currently in 
operation for this research project. Clearly, the inferences drawn from 
the results should generalize to other tailored testing applications using 
similar conceptual formulations of operation. In this sense, the results 
of this' study were intended not as isolated studies of item pool size and 
shape, stepsize magnitude, and value of the acceptance range, but rather 
intended to generalize to fairly concrete statements about the preferred 
operation of a one-parameter tailored testing procedure. As was expected, 
item pool attributes and program parameters interacted to a great extent 
in the determination of the degree of bias and amount of variance in 
ability estimation. The intention in drawing up the numerous tables and 
figures of this report was to illustrate trends of interaction among these 
variables. These trends, in large part, were the primary thrust of this 
report. They should be helpful in applying tailored testing procedures 
in which some of the variables, such as item pool attributes, have been 
fixed by practicality. An important consideration when using actual item 
pools is that calibration of actual items provides estimates of item 
parameters. Often these parameters have been obtained from a linking 
performed on several separate analyses in order to get larger samples and 
therefore more stable estimates of the difficulty values. (For a discussion 
of linking techniques see Reckase, 1979.) When implementing tailored 
testing, it must be assumed that the estimates of item difficulties contain 
minimdf error. If this assumption is not met, obviously error will 
be introduced into the ability estimates based on these estimates of item 
parameters. At least two major concerns influence the error in parameter 
L^stimates, sample size and factorial complexity of the test. For the 
vast majority of analyses in this report the itpm parameters have 'jeen 
assumed to be known. 

In conclusion, this paper was intended as a guide for those Letting 
up a tailored testing procedure. The paper does not, by any means, exhaust 
all the inferences that could be drawn from this set of data. The numerous 
tables have been included with the intention that they might serve as 
aides in guiding the development of one-parameter tailored testing systems. 



ERIC 



-31- 

' --■..RmRINCSS • -^•.---^.-•-r.l : 

JJi'Muduiiu A. Sonie latent trait models and their use in inferring an 
.xannneo s ability. In F. M. Lord and M. R. Novick, Statistical 
1^1^' -— •" — ' ^"'^ scores . Readinq. MA: . Addi so n-Wesley, 

i l'tt, ?|. p-r.-mal Communication, Washinoton, D.C., June, 1975. 

"'"''"rhJru'to;. gi?^g^^A" tailored testi-nq and the influence of item b.nk 

Paper presented at the Invitational Conference 

on Adaptive Testing, Washington, D.C.. June, 1975. 

""'''fi I 'I ^^^"''^ tailored testing. In W. H. Holtzman 
• ;^^7i^t^-^-^^ testing, and quidanr.e . New 
U)rk. Harper and Row, 1970. 

^""'\4;.^L''r'^°'lSH- ^' u' statistical the ories of mental t.st. .rnrp. 
'.t.K.irKj, iIm: Addison-Wesley, 1968. ' " 

fMtience, i. M Description of components in tailored testing. Behavior 
"■^-L^\ychJ>i ethods and Instrumentation . 1977, 9(2), 153-157. 

^' Weiss, D. J. A comparison of the fairne ss of adaptive 

- 'nvonional testing strategies /kp<;p;.^r^. ul^.^f ] ) hinnFapo 1 

•ietnodrp;yJa.?,"9l8!'' °' Psychology, Psychometric 

M. 0. An interactive computer program for tailored testing based 
«;n tne onc-paran«ter logistic model. Dehavior Research Method s and 

A:';^J'.';'^rj'/-'ttion, 1974, c(2), 208-212. : ■ 



IS 



.J i 'k; ,^ ; i 1! 



rji_e effect of item choic e on ability estimatio n when 
jPjeJ psistic tailored testing model . Paper Dre«;pnt-Pd 



'\ • ,1 



■MMjc l uijibtic taiioreg testing model . Paper presented 
•Id Meeting or the American Educational Research Association, 
.^^^^^,^DX., March, 1975. (ERIC Document Reproduction Service 

^ ■'• -''l^jJjty estimation and item calibration usi ng the one 
' ■ -:-^:;L-^ine_ter logistic models: a comparative study fRp^rrh 

' )• CoRIiiiDia: university of Missouri, Department of 

• ■•■■I- i ..)!',, I fNycholony , 1977. 

PfjoJ.cpn St ruction for use with late nt trait models 
.....,(:nted at ihe Annual Meeting of the Ameri carT E ducational 
= ■ ^^'.()ci,lt,ior) , San Francisco, April, 1979. 

:. • ''A':-3.i<^int\\ pLldd i)^t_ive ability tneasurement . (Research Report 
••i''''-twni of PsychologyTuniversity of Minnesota, December 

• '"i ParictuipdM's.Hi, 'A, A procedure for sample-free Item 
' ■•' • '■ • ' '^iir-'.L^..'ll^' jJ _A^A ^ '^y c h 0 1 og i c a 1 Me a s u reme n t . 1969, 29, 23-48. 



Appendix A 



14« 
13- 

12- 

n- 

10- 
9. 
8- 



Figure A-1 
Frequency Distribution 
of Difficulty Values: 
72 Item VCIPL Pool 



._4_. 



! i 1 ■ 

1 > ; ' 








1 
1 

: 
1 

f 


i 1 1 1 




-1 




1 

, ^ 



r 



3.3 3. a Z.l 2.4 lA 1.8 1.5 1.2 .9 .6 .3 0 -.3 -.6 -.9 



-1.; 



RANGE 



>■ 



-> > I 
I / - 

16 • 
15 J 
14 -l' 
13- 
12 - 

n • 

10 -I 
9 
8 

7 -I 
6 
5 - 
4 • 

3 



r iyure A-? 
Fr-equency Distribution 
of Difficulty Ydlues: 
180 Item ETIFI Pool 



V/- 



5.4 3.9 3.6 3.3 3.0 2.7 2.4 2.1 1.8 1.5 1.2 



.9 .6 .3 0 -.3 

RANGE 



